UK Road Safety: Traffic Accidents and Vehicles Visualizations and Solution

The goal of this project is the investigate what causes Serious and Fatal accidents in hopes of preventing and decreasing the number of them. The dataset consists of accident records from the UK over the course of 15+ years. I hope to show the causes of these accidents through visualizations and create an algorithm that can predict the severity of accidents.

The UK government collects and publishes (usually on an annual basis) detailed information about traffic accidents across the country. This information includes, but is not limited to, geographical locations, weather conditions, type of vehicles, number of casualties and vehicle manoeuvres, making this a very interesting and comprehensive dataset for analysis and research.

The data that I'm using is compiled and available through Kaggle and in a less compliled form, here.

Problem: Traffic Accidents
Solution Method: Use data to figure out how to lower the number of accidents and the severity of them.

Importing and Data Merging

In [1]:
#Import modules
import numpy as np
import holidays
import pandas as pd
import seaborn as sns
import pickle
import time
import timeit


import matplotlib.pyplot as plt
plt.style.use('dark_background')
%matplotlib inline

import datetime
import math
from collections import Counter

#scipy
import scipy.stats as stats
from scipy import stats
from scipy.stats import chi2_contingency

#sklearn
import sklearn
from sklearn import ensemble
from sklearn import preprocessing
from sklearn.decomposition import PCA
from sklearn.ensemble import AdaBoostClassifier, BaggingClassifier, ExtraTreesClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import SelectFromModel
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score 
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score
from sklearn.model_selection import cross_val_score, GridSearchCV, train_test_split
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler, OrdinalEncoder
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.utils import resample

#for clustering
from sklearn.cluster import KMeans
from sklearn.preprocessing import normalize
from sklearn.decomposition import PCA
from sklearn.cluster import MiniBatchKMeans
from sklearn.metrics import silhouette_score

#other learners
from xgboost import XGBClassifier
import lightgbm as lgb
from kmodes.kmodes import KModes

#imblearn
from imblearn.ensemble import BalancedBaggingClassifier
from imblearn.ensemble import EasyEnsembleClassifier
from imblearn.ensemble import BalancedRandomForestClassifier

#webscraping
import requests
from bs4 import BeautifulSoup
import re
import urllib
from IPython.core.display import HTML

#time series
import statsmodels.api as sm
from pylab import rcParams
import itertools
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.stattools import acf, pacf
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima_model import ARIMA


#warning ignorer
import warnings
warnings.filterwarnings("ignore")
In [2]:
# #DATAFRAME PICKLE CREATED IN CELLS BELOW INSTEAD OF RUNNING THROUGH ENTIRE PROCESS AFTER RESTARTING
# #import pickled file
# df = pd.read_pickle("df.pkl")

General Visualizations

In [278]:
accidentsperyear = df.groupby(['year'])['accident_index'].count()

# prepare plot
plt.style.use('dark_background')
plt.figure(figsize=(10,5))
colors = sns.color_palette("plasma", n_colors=7)
sns.barplot(accidentsperyear.index,accidentsperyear.values, palette=colors)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.title("Accidents Per Year",fontsize=20,fontweight="bold")
plt.xlabel("\nYear", fontsize=15, fontweight="bold")
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
plt.savefig('accidentsperyear.png')
plt.tight_layout()
In [277]:
accidentspermonth = df.groupby(['month'])['accident_index'].count()

# prepare plot
plt.style.use('dark_background')
plt.figure(figsize=(20,10))
colors = sns.color_palette("plasma_r", n_colors=12)
mt=sns.barplot(accidentspermonth.index,accidentspermonth.values, palette=colors)
sns.despine(top=True, right=True, left=True, bottom=True)
#ax is the axes instance
group_labels = ['Jan', 'Feb','Mar','Apr','May','June','July','Aug','Sept','Oct','Nov','Dec' ]

mt.set_xticklabels(group_labels)
plt.title("Accidents Per Month",fontsize=20,fontweight="bold")
plt.xticks(fontsize=18)
plt.yticks(fontsize=12)
plt.xlabel("\nMonth", fontsize=15, fontweight="bold")
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
plt.savefig('accidentspermonth.png')
plt.tight_layout()
In [276]:
weekdays = ['Monday', 'Tuesday','Wednesday','Thursday', 'Friday', 'Saturday', 'Sunday']
accweekday = df.groupby(['year', 'day_of_week']).size()
accweekday = accweekday.rename_axis(['year', 'day_of_week'])\
                               .unstack('day_of_week')\
                               .reindex(columns=weekdays)
plt.figure(figsize=(15,10))
plt.style.use('dark_background')
sns.heatmap(accweekday, cmap='plasma_r')
plt.title('\nAccidents by Weekday per Year\n', fontsize=14, fontweight='bold')
plt.xticks(fontsize=15)
plt.yticks(fontsize=12)
plt.xlabel('')
plt.ylabel('')
plt.savefig('accidentsbyweekdayperyear.png')
plt.show()

Fridays are the day of the week where the most accidents occur.

In [273]:
accidentsperseason = df.groupby(['season'])['accident_index'].count()
seaord=['spring', 'summer', 'fall','winter']
# prepare plot
plt.style.use('dark_background')
plt.figure(figsize=(15,10))

sns.barplot(accidentsperseason.index,accidentsperseason.values, order=seaord, 
            saturation=1, palette='magma_r')
sns.despine(top=True, right=True, left=True, bottom=True)
plt.title("Accidents Per Season",fontsize=20,fontweight="bold")
plt.xticks(fontsize=15)
plt.yticks(fontsize=12)
plt.xlabel("\nSeason", fontsize=15, fontweight="bold")
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
plt.tight_layout()
plt.savefig('accidentsperseason.png')
In [17]:
#"Morning Rush (6-10)", "Day (10-12)", "Lunch Rush (12-14)","Afternoon (14-16)",
#"After Work Rush (16-18)", "Evening (18-22)", "Night (22-6)"

timeofdaygroups = {'1': "Morning Rush",
                   '2': "Day",
                   '3': "Lunch Rush",
                   '4': "Afternoon",
                   '5': "After Work Rush",
                   '6': "Evening",
                   '7': "Night"}
df['time_of_day']=df['time_of_day'].map(timeofdaygroups)
In [267]:
accidentspertod = df.groupby(['time_of_day'])['accident_index'].count()

# prepare plot
plt.style.use('dark_background')
plt.figure(figsize=(15,10))
tod=["Morning Rush", "Day", "Lunch Rush", "Afternoon",
     "After Work Rush", "Evening", "Night"]
sns.barplot(accidentspertod.index,accidentspertod.values, order=tod, palette='rainbow')
sns.despine(top=True, right=True, left=True, bottom=True)
plt.title("Accidents Per Time of Day",fontsize=20,fontweight="bold")
plt.xticks(fontsize=15)
plt.yticks(fontsize=12)

plt.xlabel("", fontsize=15, fontweight="bold")
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
plt.tight_layout()
plt.savefig('accidentspertod.png')

Accident Forecasting with Tableau

In [1]:
%%HTML
<div class='tableauPlaceholder' id='viz1572069627501' style='position: relative'><noscript><a href='#'><img alt=' ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Ac&#47;AccidentForecasting&#47;AccidentForecasting&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='AccidentForecasting&#47;AccidentForecasting' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Ac&#47;AccidentForecasting&#47;AccidentForecasting&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1572069627501');                    var vizElement = divElement.getElementsByTagName('object')[0];                    vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';                    var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

Correlations

In [22]:
#correlation by accident severity pearson

corrdf=df.apply(LabelEncoder().fit_transform)
sc = StandardScaler()
corrdf = sc.fit_transform(corrdf)
corrdf=pd.DataFrame(data=corrdf,columns=df.columns)
corr=corrdf.corr()['accident_seriousness']
corr[np.argsort(corr,axis=0)[::-1]]
Out[22]:
accident_seriousness                           1.000000
vehicle_type                                   0.133941
x1st_point_of_impact                           0.076099
speed_limit                                    0.065554
skidding_and_overturning                       0.059322
vehicle_leaving_carriageway                    0.058977
sex_of_driver                                  0.054505
lsoa_of_accident_location                      0.048548
number_of_casualties                           0.043294
age_band_of_driver                             0.039851
junction_control                               0.037966
hit_object_off_carriageway                     0.036040
time_of_day                                    0.026101
model                                          0.022808
accident_index                                 0.022391
junction_location                              0.019548
road_type                                      0.019443
engine_capacity_cc_size                        0.019305
driver_imd_decile                              0.017195
weekend                                        0.017001
date                                           0.016821
propulsion_code                                0.016685
junction_detail                                0.016134
year                                           0.016002
age_of_vehicle                                 0.014322
inscotland                                     0.012642
vehicle_locationrestricted_lane                0.011370
month                                          0.006929
latitude                                       0.006853
carriageway_hazards                            0.004003
1st_road_number                                0.003862
towing_and_articulation                        0.003658
time                                           0.003062
hour                                           0.002450
local_authority_district                       0.002129
pedestrian_crossing-human_control             -0.001036
day_of_week                                   -0.001781
make                                          -0.002377
special_conditions_at_site                    -0.002652
was_vehicle_left_hand_drive                   -0.003057
1st_road_class                                -0.004179
journey_purpose_of_driver                     -0.005085
local_authority_highway                       -0.005379
season                                        -0.007647
2nd_road_number                               -0.008978
police_force                                  -0.009640
hit_object_in_carriageway                     -0.010963
pedestrian_crossing-physical_facilities       -0.012054
light_conditions                              -0.012499
road_surface_conditions                       -0.015441
longitude                                     -0.024353
weather_conditions                            -0.029648
vehicle_reference                             -0.037848
driver_home_area_type                         -0.041281
engine_capacity_cc                            -0.047446
vehicle_manoeuvre                             -0.048098
urban_or_rural_area                           -0.065074
number_of_vehicles                            -0.066120
did_police_officer_attend_scene_of_accident   -0.086731
accident_severity                             -0.973745
Name: accident_seriousness, dtype: float64
In [23]:
corr_spear=corrdf.corr(method='spearman')['accident_seriousness']
corr_spear[np.argsort(corr_spear,axis=0)[::-1]]
Out[23]:
accident_seriousness                           1.000000
vehicle_type                                   0.114374
vehicle_leaving_carriageway                    0.071079
x1st_point_of_impact                           0.067697
speed_limit                                    0.062779
skidding_and_overturning                       0.059746
sex_of_driver                                  0.054629
lsoa_of_accident_location                      0.048538
junction_control                               0.041690
age_band_of_driver                             0.037758
hit_object_off_carriageway                     0.034738
road_type                                      0.028215
time_of_day                                    0.026387
junction_location                              0.024224
junction_detail                                0.024051
accident_index                                 0.022667
number_of_casualties                           0.022535
engine_capacity_cc_size                        0.022157
model                                          0.021979
date                                           0.017136
weekend                                        0.017001
driver_imd_decile                              0.016911
propulsion_code                                0.016522
year                                           0.016282
inscotland                                     0.012642
vehicle_locationrestricted_lane                0.010124
age_of_vehicle                                 0.009042
time                                           0.007971
hour                                           0.007432
month                                          0.006572
1st_road_number                                0.006437
latitude                                       0.006432
towing_and_articulation                        0.004320
carriageway_hazards                            0.003980
local_authority_district                       0.001801
make                                           0.000794
special_conditions_at_site                     0.000097
journey_purpose_of_driver                     -0.000857
day_of_week                                   -0.001860
pedestrian_crossing-human_control             -0.001870
was_vehicle_left_hand_drive                   -0.003057
1st_road_class                                -0.003177
local_authority_highway                       -0.005600
season                                        -0.007485
police_force                                  -0.011075
road_surface_conditions                       -0.015850
pedestrian_crossing-physical_facilities       -0.016310
hit_object_in_carriageway                     -0.017504
light_conditions                              -0.019421
longitude                                     -0.024395
2nd_road_number                               -0.027880
weather_conditions                            -0.028520
engine_capacity_cc                            -0.036678
driver_home_area_type                         -0.041308
vehicle_manoeuvre                             -0.046600
vehicle_reference                             -0.049230
urban_or_rural_area                           -0.065074
number_of_vehicles                            -0.078513
did_police_officer_attend_scene_of_accident   -0.086918
accident_severity                             -0.999548
Name: accident_seriousness, dtype: float64

Looking at this I wanted to visualize some of the higher pos/negative correlations against accident severity.

Chi-Squared Test

Before these visualizations were done, I wanted to be sure that the visualizations were of some importance to accident_seriousness. For this, the chi-squared test was used.

In [41]:
"""chisquare algorithm from 
http://www.insightsbot.com/blog/2AeuRL/chi-square-feature-selection-in-python """

    
class ChiSquare:
    def __init__(self, dataframe):
        self.df = dataframe
        self.p = None #P-Value
        self.chi2 = None #Chi Test Statistic
        self.dof = None
        
        self.dfObserved = None
        self.dfExpected = None
        
    def _print_chisquare_result(self, colX, alpha):
        result = ""
        if self.p<alpha:
            result="The column {0} is IMPORTANT for Prediction".format(colX)
        else:
            result="The column {0} is NOT an important predictor. (Discard {0} from model)".format(colX)

        print(result)
        
    def TestIndependence(self,colX,colY, alpha=0.05):
        X = self.df[colX].astype(str)
        Y = self.df[colY].astype(str)
        
        self.dfObserved = pd.crosstab(Y,X) 
        chi2, p, dof, expected = stats.chi2_contingency(self.dfObserved.values)
        self.p = p
        self.chi2 = chi2
        self.dof = dof 
        
        self.dfExpected = pd.DataFrame(expected, columns=self.dfObserved.columns, 
                                       index = self.dfObserved.index)
        
        self._print_chisquare_result(colX,alpha)

#Initialize ChiSquare Class
cT = ChiSquare(df)

#Feature Selection
testColumns = ['accident_index', '1st_road_class', '1st_road_number','2nd_road_number', 
               'carriageway_hazards', 'date', 'day_of_week', 
               'did_police_officer_attend_scene_of_accident','junction_control', 
               'junction_detail', 'latitude', 'light_conditions', 'local_authority_district',
               'local_authority_highway', 'longitude','lsoa_of_accident_location', 
               'number_of_casualties', 'number_of_vehicles', 'pedestrian_crossing-human_control',
               'pedestrian_crossing-physical_facilities', 'police_force','road_surface_conditions', 
               'road_type', 'special_conditions_at_site', 'speed_limit', 'time', 
               'urban_or_rural_area', 'weather_conditions', 'year', 'inscotland', 
               'age_band_of_driver', 'age_of_vehicle', 'driver_home_area_type', 
               'driver_imd_decile', 'engine_capacity_cc','hit_object_in_carriageway', 
               'hit_object_off_carriageway', 'journey_purpose_of_driver', 'junction_location', 
               'make', 'model','propulsion_code', 'sex_of_driver', 'skidding_and_overturning',
               'towing_and_articulation', 'vehicle_leaving_carriageway',
               'vehicle_locationrestricted_lane', 'vehicle_manoeuvre','vehicle_reference',
               'vehicle_type', 'was_vehicle_left_hand_drive', 'x1st_point_of_impact', 'month',
               'weekend', 'hour', 'time_of_day','season', 'engine_capacity_cc_size']
for var in testColumns:
    cT.TestIndependence(colX=var,colY="accident_seriousness" )  
The column accident_index is IMPORTANT for Prediction
The column 1st_road_class is IMPORTANT for Prediction
The column 1st_road_number is IMPORTANT for Prediction
The column 2nd_road_number is IMPORTANT for Prediction
The column carriageway_hazards is IMPORTANT for Prediction
The column date is IMPORTANT for Prediction
The column day_of_week is IMPORTANT for Prediction
The column did_police_officer_attend_scene_of_accident is IMPORTANT for Prediction
The column junction_control is IMPORTANT for Prediction
The column junction_detail is IMPORTANT for Prediction
The column latitude is IMPORTANT for Prediction
The column light_conditions is IMPORTANT for Prediction
The column local_authority_district is IMPORTANT for Prediction
The column local_authority_highway is IMPORTANT for Prediction
The column longitude is IMPORTANT for Prediction
The column lsoa_of_accident_location is IMPORTANT for Prediction
The column number_of_casualties is IMPORTANT for Prediction
The column number_of_vehicles is IMPORTANT for Prediction
The column pedestrian_crossing-human_control is IMPORTANT for Prediction
The column pedestrian_crossing-physical_facilities is IMPORTANT for Prediction
The column police_force is IMPORTANT for Prediction
The column road_surface_conditions is IMPORTANT for Prediction
The column road_type is IMPORTANT for Prediction
The column special_conditions_at_site is IMPORTANT for Prediction
The column speed_limit is IMPORTANT for Prediction
The column time is IMPORTANT for Prediction
The column urban_or_rural_area is IMPORTANT for Prediction
The column weather_conditions is IMPORTANT for Prediction
The column year is IMPORTANT for Prediction
The column inscotland is IMPORTANT for Prediction
The column age_band_of_driver is IMPORTANT for Prediction
The column age_of_vehicle is IMPORTANT for Prediction
The column driver_home_area_type is IMPORTANT for Prediction
The column driver_imd_decile is IMPORTANT for Prediction
The column engine_capacity_cc is IMPORTANT for Prediction
The column hit_object_in_carriageway is IMPORTANT for Prediction
The column hit_object_off_carriageway is IMPORTANT for Prediction
The column journey_purpose_of_driver is IMPORTANT for Prediction
The column junction_location is IMPORTANT for Prediction
The column make is IMPORTANT for Prediction
The column model is IMPORTANT for Prediction
The column propulsion_code is IMPORTANT for Prediction
The column sex_of_driver is IMPORTANT for Prediction
The column skidding_and_overturning is IMPORTANT for Prediction
The column towing_and_articulation is IMPORTANT for Prediction
The column vehicle_leaving_carriageway is IMPORTANT for Prediction
The column vehicle_locationrestricted_lane is IMPORTANT for Prediction
The column vehicle_manoeuvre is IMPORTANT for Prediction
The column vehicle_reference is IMPORTANT for Prediction
The column vehicle_type is IMPORTANT for Prediction
The column was_vehicle_left_hand_drive is IMPORTANT for Prediction
The column x1st_point_of_impact is IMPORTANT for Prediction
The column month is IMPORTANT for Prediction
The column weekend is IMPORTANT for Prediction
The column hour is IMPORTANT for Prediction
The column time_of_day is IMPORTANT for Prediction
The column season is IMPORTANT for Prediction
The column engine_capacity_cc_size is IMPORTANT for Prediction

Visualizations In Relation to Accident Seriousness

Method:

For my visualizations I have decided to use some of the features with the highest correlations to accident_seriousness:

  • did_police_officer_attend_scene_of_accident
  • x1st_point_of_impact
  • number_of_vehicles
  • speed_limit
  • urban_or_rural_area
  • skidding_and_overturning
  • vehicle_leaving_carriageway
  • sex_of_driver
  • vehicle_type
  • vehicle_manoeuvre
  • engine_capacity_cc
  • number_of_casualties
  • driver_home_area_type
  • age_band_of_driver
  • junction_control
  • hit_object_off_carriageway
  • hit_object_in_carriageway
  • driver_imd_decile *
  • junction_detail *
  • junction_location *
  • propulsion_code *
  • year *

Note: The columns used were selected because of the absolute value of their correlation in relation to accident_seriousness

*columns added after correlation was done after undersampling

For visual reasons, two separate dataframes were created, for not serious and serious accidents. I wanted to better scale the data and for me, this was the simplest way of doing so without resampling.

In [3]:
#dataframe where accidents are Slight
not_serious = df[(df['accident_seriousness']=="Not Serious")]
print("Not Serious Group Shape:", not_serious.shape)

not_serious.accident_seriousness.value_counts()
Not Serious Group Shape: (492804, 60)
Out[3]:
Not Serious    492804
Name: accident_seriousness, dtype: int64
In [4]:
#dataframe where accidents are serious
serious= df[(df['accident_seriousness']=="Serious")]

print("Serious Group Shape:", serious.shape)
serious.accident_seriousness.value_counts()
Serious Group Shape: (68331, 60)
Out[4]:
Serious    68331
Name: accident_seriousness, dtype: int64
In [20]:
#map 1, 2, 3 in did_police_officer_attend_scene_of_accident with Yes, No,Self-reported
policeattend = {1: "Yes", 2:"No", 3:"Self-Reported"}
not_serious['did_police_officer_attend_scene_of_accident']=not_serious['did_police_officer_attend_scene_of_accident'].map(policeattend)
df['did_police_officer_attend_scene_of_accident']=df['did_police_officer_attend_scene_of_accident'].map(policeattend)
serious['did_police_officer_attend_scene_of_accident']=serious['did_police_officer_attend_scene_of_accident'].map(policeattend)
In [21]:
imddecile = {1:"Most deprived 10%", 2:"More deprived 10-20%", 3:"More deprived 20-30%", 
             4:"More deprived 30-40%", 5:"More deprived 40-50%", 6:"Less deprived 40-50%", 
             7:"Less deprived 30-40%", 8:"Less deprived 20-30%", 9:"Less deprived 10-20%", 
             10:"Least deprived 10%"}

not_serious['driver_imd_decile']=not_serious['driver_imd_decile'].map(imddecile)
df['driver_imd_decile']=df['driver_imd_decile'].map(imddecile)
serious['driver_imd_decile']=serious['driver_imd_decile'].map(imddecile)
In [22]:
#setups for adding frequencies to visualizations
dftotal= float(len(df))
nstotal= float(len(not_serious))
setotal= float(len(serious))

Did Police Officer Attend Scene Of Accident

In [101]:
#Did Police Officer Attend Scene Of Accident
plt.figure(figsize=(15,10))
ax = sns.countplot("did_police_officer_attend_scene_of_accident", hue="accident_seriousness",  
              palette="PuBu", data=not_serious)
plt.title("Did Police Officer Attend Scene Of Not Serious Accident",
          fontsize=20, fontweight="bold")
plt.style.use('dark_background')
plt.xlabel("\nAttendance", fontsize=15, fontweight="bold")
plt.legend(fontsize=15, bbox_to_anchor=(1.0, 1), loc='upper right', ncol=1)
plt.ylabel("\nNumber Attended", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.3f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('did_police_officer_attend_scene_of_accident_not_serious.png')
plt.show()


#Did Police Officer Attend Scene Of Accident
plt.figure(figsize=(15,10))
ax = sns.countplot("did_police_officer_attend_scene_of_accident", hue="accident_seriousness",  
              palette="PuBu", data=serious)
plt.title("Did Police Officer Attend Scene Of Serious Accident",
          fontsize=20, fontweight="bold")
plt.style.use('dark_background')
plt.xlabel("\nAttendance", fontsize=15, fontweight="bold")
plt.legend(fontsize=15, bbox_to_anchor=(1.0, 1), loc='upper right', ncol=1)
plt.ylabel("\nNumber Attended", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.3f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('did_police_officer_attend_scene_of_accident_serious.png')
plt.show()

First Point of Impact Vs Accident Seriousness

In [102]:
# First Point of Impact Vs Accident Seriousness (Not Serious)
fpoa_order =["Front", "Nearside", "Did not impact", "Back", "Offside"]
plt.figure(figsize=(20,10))
ax = sns.countplot("x1st_point_of_impact", hue="accident_seriousness", order=fpoa_order,  
              palette="PuBu", data=not_serious)
plt.title("First Point of Impact in Not Serious Accidents",fontsize=20,fontweight="bold")
plt.style.use('dark_background')
plt.xlabel("\nPoint of Impact", fontsize=15, fontweight="bold")
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.ylabel("\nFirst Point of Impact Count", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('x1st_point_of_impact_not_serious.png')
plt.show()


# First Point of Impact Vs Accident Seriousness
plt.figure(figsize=(20,10))
ax = sns.countplot("x1st_point_of_impact", hue="accident_seriousness",  order=fpoa_order,
              palette="PuBu", data=serious)
plt.title("First Point of Impact in Serious Accidents",fontsize=20,fontweight="bold")
plt.style.use('dark_background')
plt.xlabel("\nPoint of Impact", fontsize=15, fontweight="bold")
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.ylabel("\nFirst Point of Impact Count", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('x1st_point_of_impact_serious.png')
plt.show()

Number of Vehicles

In [103]:
#number of vehicles vs accidentseriousness
nov_order=["1","2", "3", "4+"]
#notserious
plt.figure(figsize=(20,10))
ax = sns.countplot("accident_seriousness", hue="number_of_vehicles", hue_order=nov_order,
              palette="GnBu_d", data=not_serious)

plt.style.use('dark_background')
plt.title("Number of Vehicles in Not Serious Accidents",
          fontsize=20, fontweight="bold")

plt.xlabel("\nNumber of Vehicles", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('number_of_vehicles_not_serious.png')
plt.show()



#serious
plt.figure(figsize=(20,10))
ax = sns.countplot("accident_seriousness", hue="number_of_vehicles", hue_order=nov_order,
              palette="GnBu_d", data=serious)
plt.style.use('dark_background')
plt.title("Number of Vehicles in Serious Accidents",
          fontsize=20, fontweight="bold")

plt.xlabel("\nNumber of Vehicles", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('number_of_vehicles_serious.png')
plt.show()
Speed Limit vs Accident Seriousness
In [111]:
#notserious
splt_order=[15.0, 20.0,30.0,40.0 ,50.0,60.0, 70.0]
#splt1_order=[20.0,30.0,40.0 ,50.0,60.0, 70.0]
plt.figure(figsize=(20,10))
ax = sns.countplot("speed_limit", hue="accident_seriousness", order=splt_order,
              palette="PuBu", data=not_serious)
plt.title("Speed Limit vs Not Serious Accidents",fontsize=20,fontweight="bold")
plt.style.use('dark_background')
plt.xlabel("\nSpeed Limits", fontsize=15, fontweight="bold")
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.ylabel("\nCount", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.4f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('speed_limit_not_serious.png')
plt.show()

#erious
plt.figure(figsize=(20,10))
ax = sns.countplot("speed_limit", hue="accident_seriousness", 
              palette="PuBu", data=serious)
plt.title("Speed Limit vs Serious Accidents",fontsize=20,fontweight="bold")
plt.style.use('dark_background')
plt.xlabel("\nSpeed Limits", fontsize=15, fontweight="bold")
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.ylabel("\nCount", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.3f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('speed_limit_serious.png')
plt.show()

Urban or Rural Area vs Accident Seriousness

In [112]:
#urban_or_rural_area vs accident seriousness
plt.figure(figsize=(20,10))
ax = sns.countplot("accident_seriousness",  hue="urban_or_rural_area",
              palette="PuBu", data=not_serious)
plt.title("Urban  or Rural  Area vs Accident Severity",fontsize=20,fontweight="bold")
plt.style.use('dark_background')
plt.xlabel("\nSeverity", fontsize=15, fontweight="bold")
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.ylabel("\nUrban or Rural Area Count", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('urban_or_rural_area_not_serious.png')
plt.show()

#urban_or_rural_area vs accident seriousness
plt.figure(figsize=(20,10))
ax = sns.countplot("accident_seriousness",  hue="urban_or_rural_area",
              palette="PuBu", data=serious)
plt.title("Urban  or Rural  Area vs Accident Severity",fontsize=20,fontweight="bold")
plt.style.use('dark_background')
plt.xlabel("\nSeverity", fontsize=15, fontweight="bold")
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.ylabel("\nUrban or Rural Area Count", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('urban_or_rural_area_serious.png')
plt.show()

Skidding and Overturning vs Seriousness

In [116]:
#Not Serious Accident
sao_order=["None", "Skidded", "Skidded and overturned", "Overturned", "Jackknifed", 
           "Jackknifed and overturned"]

plt.figure(figsize=(15,10))
ax = sns.countplot("accident_seriousness", hue="skidding_and_overturning", hue_order=sao_order,
              palette="magma", data=not_serious)
plt.style.use('dark_background')
plt.title("Skidding and Overturning in Not Serious Accidents",fontsize=25,fontweight="bold")
plt.xlabel("\nNot Serious Skidding and Overturning", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold") 
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.3f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)

plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('skidding_and_overturning_not_serious.png')
plt.show()


#Serious Accident Manuevers
plt.figure(figsize=(15,10))
ax= sns.countplot("accident_seriousness", hue="skidding_and_overturning", hue_order=sao_order,
              palette="magma", data=serious)
plt.style.use('dark_background')
plt.title("Skidding and Overturning in Serious Accidents",fontsize=25,fontweight="bold")
plt.xlabel("\nSerious Skidding and Overturning", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold") 
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.3f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)

plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('skidding_and_overturning_serious.png')
plt.show()

Vehicle Leaving Carriageway vs Seriousness

In [118]:
#Not Serious Accident Manuevers
vlc_order=["Did not leave carriageway", "Straight ahead at junction", "Nearside", 
           "Offside", "Offside on to central reservation", "Nearside and rebounded", 
           "Offside - crossed central reservation", "Offside and rebounded", 
           "Offside on to centrl res + rebounded"]

plt.figure(figsize=(15,10))
ax=sns.countplot("accident_seriousness", hue="vehicle_leaving_carriageway", hue_order=vlc_order,
              palette="plasma", data=not_serious)
plt.style.use('dark_background')
plt.title("Vehicle Leaving Carriageway  in Not Serious Accidents",fontsize=25,fontweight="bold")
plt.xlabel("\nNot Serious Vehicle Leaving Carriageway ", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.ylabel("\nNumber of Accidents\n", fontsize=15, fontweight="bold") 
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.3f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('vehicle_leaving_carriageway_not_serious.png')
plt.show()


#Serious Accident Manuevers
plt.figure(figsize=(15,10))
ax=sns.countplot("accident_seriousness", hue="vehicle_leaving_carriageway", hue_order=vlc_order,
              palette="plasma", data=serious)
plt.style.use('dark_background')
plt.title("Vehicle Leaving Carriageway  in Serious Accidents",fontsize=25,fontweight="bold")
plt.xlabel("\nSerious Vehicle Leaving Carriageway ", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.ylabel("\nNumber of Accidents\n", fontsize=15, fontweight="bold") 
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.3f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)

plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('vehicle_leaving_carriageway_serious.png')
plt.show()

Sex of Driver vs Seriousness

In [121]:
#sex_of_driver
sod_order=["Female", "Male", "Not known"]
plt.figure(figsize=(15,10))
ax=sns.countplot("accident_seriousness", hue="sex_of_driver", hue_order=sod_order,
              palette="magma", data=not_serious)
plt.style.use('dark_background')
plt.title("Sex of Driver in Not Serious Accidents",fontsize=25,fontweight="bold")
plt.xlabel("\nSex of Driver", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold") 
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)

plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('sex_of_driver_not_serious.png')
plt.show()

#sex_of_driver serious
plt.figure(figsize=(15,10))
ax=sns.countplot("accident_seriousness", hue="sex_of_driver", hue_order=sod_order,
              palette="magma", data=serious)
plt.style.use('dark_background')
plt.title("Sex of Driver in Serious Accidents",fontsize=25,fontweight="bold")
plt.xlabel("\nSex of Driver", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold") 
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)

plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('sex_of_driver_serious.png')
plt.show()
In [122]:
#sex_of_driver
df['sex_of_driver'].value_counts()/df.shape[0]*100
Out[122]:
Male         62.289645
Female       37.562262
Not known     0.148093
Name: sex_of_driver, dtype: float64

Vehicle Type vs Seriousness

In [126]:
#Not Serious Accident Type
vt_order=['Bus', 'Car', 'Goods Vehicle', 'Motorcycle', 'Other Vehicle']

plt.figure(figsize=(15,10))
ax=sns.countplot("accident_seriousness", hue="vehicle_type", hue_order=vt_order,
                 palette="tab20", data=not_serious)
plt.style.use('dark_background')
plt.title("Vehicle Type in Not Serious Accidents",fontsize=25,fontweight="bold")
plt.xlabel("\nNot Serious Accidents by Vehicle Type", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold") 
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)

plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('vehicle_type_not_serious.png')
plt.show()


#Serious Accident Type
plt.figure(figsize=(15,10))
ax=sns.countplot("accident_seriousness", hue="vehicle_type", hue_order=vt_order,
              palette="tab20", data=serious)
plt.style.use('dark_background')
plt.title("Vehicle Type in Serious Accidents",fontsize=25,fontweight="bold")
plt.xlabel("\nSerious Accidents by Vehicle Type", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold") 
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)

plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('vehicle_type_serious.png')
plt.show()

Vehicle Manoeuvres

In [128]:
#Not Serious Accident Manuevers

vm_order=['Turning right', 'Going ahead other', 'Going ahead right-hand bend',
          'Slowing or stopping', 'Turning left', 'Waiting to go - held up',
          'Waiting to turn right', 'Overtaking static vehicle - offside' ,
          'Parked', 'Overtaking - nearside', 'U-turn', 'Changing lane to right', 
          'Reversing', 'Waiting to turn left', 'Changing lane to left',
          'Going ahead left-hand bend', 'Overtaking moving vehicle - offside', 'Moving off']

plt.figure(figsize=(20,10))
ax=sns.countplot("accident_seriousness", hue="vehicle_manoeuvre", hue_order=vm_order,
              palette="tab20", data=not_serious)
plt.style.use('dark_background')
plt.title("Vehicle Manuevers in Not Serious Accidents",fontsize=25,fontweight="bold")
plt.xlabel("\nNot Serious Vehicle Manuevers", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold") 
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('vehicle_manoeuvre_not_serious.png')
plt.show()


#Serious Accident Manuevers
plt.figure(figsize=(20,10))
ax=sns.countplot("accident_seriousness", hue="vehicle_manoeuvre",hue_order=vm_order,
              palette="tab20", data=serious)
plt.style.use('dark_background')
plt.title("Vehicle Manuevers in Serious Accidents",fontsize=25,fontweight="bold")
plt.xlabel("\nSerious Vehicle Manuevers", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold") 
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)

plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('vehicle_manoeuvre_serious.png')
plt.show()

Driver Home Type Area

In [130]:
#driver_home_area_type
dhoa_order=['Urban area', 'Rural', 'Small town']
#Serious Accident Driver Home Type Area
plt.figure(figsize=(20,15))
ax= sns.countplot("accident_seriousness", hue="driver_home_area_type", hue_order=dhoa_order,
              palette="rainbow", data=not_serious)

plt.style.use('dark_background')
plt.title("Accident Driver Home Type Area in Not Serious Accidents",fontsize=25,fontweight="bold")
plt.xlabel("\nSeriousness", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
#plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('driver_home_area_type_not_serious.png')
plt.show()


#driver_home_area_type

#Serious Accident Driver Home Type Area
plt.figure(figsize=(20,15))
ax= sns.countplot("accident_seriousness", hue="driver_home_area_type", hue_order=dhoa_order,
              palette="rainbow", data=serious)

plt.style.use('dark_background')
plt.title("Accident Driver Home Type Area in Serious Accidents",fontsize=25,fontweight="bold")
plt.xlabel("\nSeriousness", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
#plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('driver_home_area_type_serious.png')
plt.show()

Age Band of Driver

In [131]:
#age_band_of_driver
abod_order=['Under 16', '16-25', '26-45', '46-65','Over 65']
#Not Serious Accident age_band_of_driver
plt.figure(figsize=(20,15))
ax=sns.countplot("accident_seriousness", hue="age_band_of_driver", hue_order=abod_order,
              palette="magma", data=not_serious)

plt.style.use('dark_background')
plt.title("Not Serious Accident by Age Band of Driver",fontsize=25,fontweight="bold")
plt.xlabel("\nNot Serious Accident by Age Band of Driver", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
#plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('age_band_of_driver_not_serious.png')
plt.show()


#Serious Accident age_band_of_driver
plt.figure(figsize=(20,15))
ax=sns.countplot("accident_seriousness", hue="age_band_of_driver", hue_order=abod_order,
              palette="magma", data=serious)

plt.style.use('dark_background')
plt.title("Serious Accident by Age Band of Driver",fontsize=25,fontweight="bold")
plt.xlabel("\nSerious Accident by Age Band of Driver", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
#plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('age_band_of_driver_serious.png')
plt.show()

Junction Control

In [133]:
#junction_control
jc_order = ['Give way or uncontrolled', 'Auto traffic signal', 'Authorised person',
            'Stop sign','Not at junction or within 20 metres']
#Not Serious Accident junction_control
plt.figure(figsize=(20,15))
ax=sns.countplot("accident_seriousness", hue="junction_control", hue_order=jc_order,
              palette="magma", data=not_serious)

plt.style.use('dark_background')
plt.title("Not Serious Accident by Junction Control",fontsize=25,fontweight="bold")
plt.xlabel("\nNot Serious Accident by Junction Control", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('junction_control_not_serious.png')
plt.show()

#Serious Accident junction_control
plt.figure(figsize=(20,15))
ax=sns.countplot("accident_seriousness", hue="junction_control",hue_order=jc_order,
              palette="magma", data=serious)

plt.style.use('dark_background')
plt.title("Serious Accident by Junction Control",fontsize=25,fontweight="bold")
plt.xlabel("\nSerious Accident by Junction Control", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('junction_control_serious.png')
plt.show()

Hit Object Off Carriageway

In [135]:
#hit_object_off_carriageway
hooffc_order=['None', 'Lamp post', 'Road sign or traffic signal', 'Other permanent object',
              'Entered ditch', 'Tree', 'Near/Offside crash barrier','Central crash barrier',
              'Bus stop or bus shelter', 'Telegraph or electricity pole', 'Submerged in water',
              'Wall or fence']
#Not Serious Accident hit_object_off_carriageway
plt.figure(figsize=(20,15))
ax=sns.countplot("accident_seriousness", hue="hit_object_off_carriageway", hue_order=hooffc_order,
              palette="plasma", data=not_serious)

plt.style.use('dark_background')
plt.title("Not Serious Accident by Hit Object Off Carriageway",fontsize=25,fontweight="bold")
plt.xlabel("\nNot Serious Accident by Hit Object Off Carriageway", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('hit_object_off_carriageway_not_serious.png')
plt.show()

#Serious Accident hit_object_off_carriageway
plt.figure(figsize=(20,15))
ax=sns.countplot("accident_seriousness", hue="hit_object_off_carriageway", hue_order=hooffc_order,
              palette="plasma", data=serious)
plt.style.use('dark_background')
plt.title("Serious Accident by Hit Object Off Carriageway",fontsize=25,fontweight="bold")
plt.xlabel("\nSerious Accident by Hit Object Off Carriageway", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('hit_object_off_carriageway_serious.png')
plt.show()

Hit Object In Carriageway

In [222]:
#hit_object_in_carriageway
hoinc_order=['None', 'Kerb', 'Other object', 'Bollard or refuge', 'Parked vehicle',
             'Road works', 'Open door of vehicle', 'Central island of roundabout',
             'Previous accident', 'Bridge (side)', 'Any animal (except ridden horse)',
             'Bridge (roof)']
#Not Serious Accident hit_object_in_carriageway
plt.figure(figsize=(20,15))
ax=sns.countplot("accident_seriousness", hue="hit_object_in_carriageway", hue_order=hoinc_order,
              palette="plasma", data=not_serious)

plt.style.use('dark_background')
plt.title("Not Serious Accident by Hit Object in Carriageway",fontsize=25,fontweight="bold")
plt.xlabel("\nNot Serious Accident by Hit Object in Carriageway", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('hit_object_in_carriageway_not_serious.png')
plt.show()

#Serious Accident hit_object_in_carriageway
plt.figure(figsize=(20,15))
ax=sns.countplot("accident_seriousness", hue="hit_object_in_carriageway", hue_order=hoinc_order,
              palette="plasma", data=serious)
plt.style.use('dark_background')
plt.title("Serious Accident by Hit Object in Carriageway",fontsize=25,fontweight="bold")
plt.xlabel("\nSerious Accident by Hit Object in Carriageway", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('hit_object_in_carriageway_serious.png')
plt.show()

Driver IMD Decile

In [221]:
#driver_imd_decile
imd_order=["Least deprived 10%", "Less deprived 10-20%", "Less deprived 20-30%", 
           "Less deprived 30-40%","Less deprived 40-50%","Most deprived 10%",
           "More deprived 10-20%", "More deprived 20-30%", "More deprived 30-40%",
           "More deprived 40-50%"]
#Not Serious Accident driver_imd_decile
plt.figure(figsize=(20,15))
ax=sns.countplot("accident_seriousness", hue="driver_imd_decile", hue_order=imd_order,
              palette="plasma", data=not_serious)

plt.style.use('dark_background')
plt.title("Not Serious Accident by Driver Area Deprivation Score",fontsize=25,fontweight="bold")
plt.xlabel("\nNot Serious Accident by Driver Area Deprivation Score", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('driver_imd_decile_not_serious.png')
plt.show()


#Serious Accident driver_imd_decile
plt.figure(figsize=(20,15))
ax=sns.countplot("accident_seriousness", hue="driver_imd_decile", hue_order=imd_order,
              palette="plasma", data=serious)

plt.style.use('dark_background')
plt.title("Serious Accident by Driver Area Deprivation Score",fontsize=25,fontweight="bold")
plt.xlabel("\nSerious Accident by Driver Area Deprivation Score", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('driver_imd_decile_serious.png')
plt.show()

Junction Detail

In [210]:
#junction_detail
jud_order=['T or staggered junction', 'Mini-roundabout', 'Crossroads',
           'Private drive or entrance', 'More than 4 arms (not roundabout)',
           'Roundabout', 'Slip road', 'Other junction','Not at junction or within 20 metres']
#Not Serious Accident junction_detail
plt.figure(figsize=(20,15))
ax=sns.countplot("accident_seriousness", hue="junction_detail", hue_order=jud_order,
              palette="plasma", data=not_serious)

plt.style.use('dark_background')
plt.title("Not Serious Accident by Junction Detail",fontsize=25,fontweight="bold")
plt.xlabel("\nNot Serious Accident by Junction Detail", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('junction_detail_not_serious.png')
plt.show()


#Serious Accident junction_detail
plt.figure(figsize=(20,15))
ax=sns.countplot("accident_seriousness", hue="junction_detail", hue_order=jud_order,
              palette="plasma", data=serious)

plt.style.use('dark_background')
plt.title("Serious Accident by Junction Detail",fontsize=25,fontweight="bold")
plt.xlabel("\nSerious Accident by Junction Detail", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('junction_detail_serious.png')
plt.show()

Junction Location

In [211]:
#junction_location
jul_order=['Mid Junction - on roundabout or on main road', 'Entering main road',
           'Approaching junction or waiting/parked at junction approach',
           'Cleared junction or waiting/parked at junction exit', 'Leaving main road',
           'Leaving roundabout', 'Entering roundabout', 'Entering from slip road',
           'Not at or within 20 metres of junction']
#Not Serious Accident junction_location
plt.figure(figsize=(20,15))
ax=sns.countplot("accident_seriousness", hue="junction_location", hue_order=jul_order,
              palette="plasma", data=not_serious)

plt.style.use('dark_background')
plt.title("Not Serious Accident by Junction Location",fontsize=25,fontweight="bold")
plt.xlabel("\nNot Serious Accident by Junction Location", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('junction_location_not_serious.png')
plt.show()


#Serious Accident junction_location
plt.figure(figsize=(20,15))
ax=sns.countplot("accident_seriousness", hue="junction_location", hue_order=jul_order,
              palette="plasma", data=serious)

plt.style.use('dark_background')
plt.title("Serious Accident by Junction Location",fontsize=25,fontweight="bold")
plt.xlabel("\nSerious Accident by Junction Location", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('junction_location_serious.png')
plt.show()

Propulsion Code

In [218]:
#propulsion_code
pd_order=['Petrol', 'Heavy oil', 'Hybrid electric', 'Bio-fuel', 'LPG Petrol', 'Diesel',
          'Fuel cells', 'New fuel technology', 'Electric diesel']
pd_order2=['Petrol', 'Heavy oil', 'Hybrid electric', 'Bio-fuel', 'LPG Petrol', 'Electric diesel']
#Not Serious Accident propulsion_code
plt.figure(figsize=(20,15))
ax=sns.countplot("accident_seriousness", hue="propulsion_code", hue_order=pd_order,
              palette="plasma", data=not_serious)

plt.style.use('dark_background')
plt.title("Not Serious Accident by Propulsion Code",fontsize=25,fontweight="bold")
plt.xlabel("\nNot Serious Accident by Propulsion Code", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('propulsion_code_not_serious.png')
plt.show()


#Serious Accident propulsion_code
plt.figure(figsize=(20,15))
ax=sns.countplot("accident_seriousness", hue="propulsion_code", hue_order=pd_order2,
              palette="plasma", data=serious)

plt.style.use('dark_background')
plt.title("Serious Accident by Propulsion Code",fontsize=25,fontweight="bold")
plt.xlabel("\nSerious Accident by Propulsion Code", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('propulsion_code_serious.png')
plt.show()

Year

In [230]:
#yeare
year_order=[2010, 2011, 2012, 2013, 2014, 2015, 2016]

#Not Serious Accident yeare
plt.figure(figsize=(20,15))
ax=sns.countplot("accident_seriousness", hue="year", hue_order=year_order,
              palette="plasma", data=not_serious)

plt.style.use('dark_background')
plt.title("Not Serious Accident by Year",fontsize=25,fontweight="bold")
plt.xlabel("\nNot Serious Accident by Year", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/nstotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('year_not_serious.png')
plt.show()


#Serious Accident year
plt.figure(figsize=(20,15))
ax=sns.countplot("accident_seriousness", hue="year", hue_order=year_order,
              palette="plasma", data=serious)

plt.style.use('dark_background')
plt.title("Serious Accident by Year",fontsize=25,fontweight="bold")
plt.xlabel("\nSerious Accident by Year", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 3,
            '{:1.2f}%'.format(height/setotal*100),
            ha="center",fontsize=15) 
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=True)
plt.savefig('year_serious.png')
plt.show()

Visualization Summary

  • did_police_officer_attend_scene_of_accident: Police attended most accidents but were less likely to NOT be called in serious accidents.
  • x1st_point_of_impact: Majority of accidents were front impacted as the first point of impact. Not serious accidents had a higher percentage of Back impact accidents than serious accidents. Serious accidents had higher percentages of Offside and Nearside accidents.
  • number_of_vehicles: Nothing significant.
  • speed_limit: Majority of accidents occurred in 30 speed limit zones. It would have been beneficial to have actual data on the speeds of the vehicles involved or at least if they were speeding.
  • urban_or_rural_area: Rural areas had a higher percentage of serious accidents. This may relate to hospital locations or emergency vehicle arrival data which was not available.
  • skidding_and_overturning: Higher percentages of serious accidents involved skidding, jackknifing or overturning.
  • vehicle_leaving_carriageway: Most vehicles did not leave the carriageway in either type of accident, however serious accidents had higher percentages of those that did leave the carriageway.
  • sex_of_driver: Men were more involved in both serious and not serious accidents, however according to racfoundation.org, there are only 355 of female privately registered cars on UK roads.
  • vehicle_type: Motorcycles were involved in a significantly higher percentage of serious accidents than not serious accidents
  • vehicle_manoeuvre: Nothing significant.
  • driver_home_area_type: Rural and Small Towns has higher percentages of serious accidents. This may relate to hospital locations or emergency vehicle arrival data which was not available.
  • age_band_of_driver: The age bands over the age of 25 had a higher percentage of serious accidents than not serious.
  • junction_control: Most areas with accidents were uncontrolled.
  • hit_object_off_carriageway: The majority of accidents did not involve objects being hit off the carriageway, however serious accidents had higher percentages of accidents that did involve hitting an object off the carriageway.
  • hit_object_in_carriageway: Most accidents did not involve objects being hit in the carriageway; however serious accidents had higher percentages of accidents that did involve hitting an object off the carriageway.
  • driver_imd_decile: Nothing significant. Most accidents occurred in areas that were Less deprived 20-30%
  • junction_detail: T or staggered junctions were where most of the accidents occurred.
  • junction_location: Nothing that separates the two serious types. However, most accidents seem to have occurred in Mid Junction - on roundabout or on main road or situations where the driver was approaching junction or waiting/parked at junction approach.
  • propulsion_code: Diesel, Fuel cells, New fuel technology, vehicles were not recorded as a part of serious accidents.
  • year: There has been a spike in percentage of serious accidents over the years. However, the percentage of not serious accidents has remained somewhat consistent

Other Visualizations

Due to the previous visualizations a comparison of certain variables was desired to see more correlations.

  • Junction Control by Junction Detail
  • Junction Control by Junction Location
  • First point of Impact by Junction Detail
  • First point of Impact by Junction Location
  • Junction Control and First Point of Impact

Junction Control by Junction Detail

In [246]:
#Not Serious Accident
plt.figure(figsize=(20,15))
ax=sns.countplot("junction_control", hue="junction_detail",
              palette="plasma", data=df)

plt.style.use('dark_background')
plt.title("Junction Control by Junction Detail",fontsize=25,fontweight="bold")
plt.xlabel("\nAccident by Year", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")

plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
# plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=False)
plt.savefig('junction_control_by_junction_detail.png')
plt.show()

Junction Control by Junction Location

In [245]:
plt.figure(figsize=(20,15))
ax=sns.countplot("junction_control", hue="junction_location",
              palette="plasma", data=df)

plt.style.use('dark_background')
plt.title("Junction Control by Junction Location in Accidents",fontsize=25,fontweight="bold")
plt.xlabel("\nAccident by Year", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")

plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
# plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=False)
plt.savefig('junction_control_by_junction_location.png')
plt.show()

First point of Impact by Junction Detail

In [248]:
plt.figure(figsize=(20,15))
ax=sns.countplot("x1st_point_of_impact", hue="junction_detail",
              palette="plasma", data=df)

plt.style.use('dark_background')
plt.title("First point of Impact by Junction Detail",fontsize=25,fontweight="bold")
plt.xlabel("\nAccident by Year", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")

plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
# plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=False)
plt.savefig('x1st_point_of_impact_by_junction_detail.png')
plt.show()

First point of Impact by Junction Location

In [247]:
plt.figure(figsize=(20,15))
ax=sns.countplot("x1st_point_of_impact", hue="junction_location",
              palette="plasma", data=df)

plt.style.use('dark_background')
plt.title("First point of Impact by Junction Location",fontsize=25,fontweight="bold")
plt.xlabel("\nAccident by Year", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")

plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
# plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=False)
plt.savefig('x1st_point_of_impact_by_junction_location.png')
plt.show()

Junction Control and First Point of Impact

In [249]:
plt.figure(figsize=(20,15))
ax=sns.countplot("x1st_point_of_impact", hue="junction_control",
              palette="plasma", data=df)

plt.style.use('dark_background')
plt.title("First point of Impact by Junction Control",fontsize=25,fontweight="bold")
plt.xlabel("\nAccident by Year", fontsize=15, fontweight="bold")
plt.legend().set_title('')
plt.legend(fontsize='22', loc = 'upper right')
plt.ylabel("\nNumber of Accidents", fontsize=15, fontweight="bold")

plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.legend(fontsize='15', bbox_to_anchor=(1.04, 1), loc='upper right', ncol=1)
# plt.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
sns.despine(top=True, right=True, left=True, bottom=False)
plt.savefig('x1st_point_of_impact_by_junction_control.png')
plt.show()

Other Visualizations Summary

No matter the situation above, the most accidents were involving areas that were uncontrolled. One of the main ones were the junction Detail T or staggered junction.

Other areas of concern include Mid Junctions on roundabouts or main roads and areas approaching a junction were cars were either parking or waiting in the junction.

Solution

From the data above more controlled areas would be benefical. Maybe signs alerting drivers of the upcoming junctions, traffic lights, or stop signs would help in some of these areas where they are feasible.

staggered-junctions.jpg

For example, this is a staggered junction, the main junction detail in accidents. One can understand how a situation such as these can lead to numerous accidents especially if proper signage is not available. Perhaps traffic lights, stop signs, or warnings indicating that they are approaching certain junctions would help reduce accidents.

Web Scraping

Below you wll find a web scrape of the website, Learner Driving Centres, which contains information on road signs in the UK. They were pulled to show examples of signage available to be placed.

In [5]:
#request website
r = requests.get('https://www.learnerdriving.com/learn-to-drive/highway-code/road-signs')

#parse HTML
soup = BeautifulSoup(r.text, 'html.parser')

#filter results
results = soup.find_all('div', attrs={'class':'fifth'})
In [6]:
#done to find specific results area
first_result=results[0]
first_result
first_result.find('img')['src']
Out[6]:
'/images/highway-code/entry-to-20-mph-zone.png'
In [7]:
#get images of signs and sign descriptions 
signage = []
for result in results:
    sign=result.find('img')['src']
    sign_desc=result.contents[1]
    signage.append((sign, sign_desc))
In [8]:
#put pulled UK Traffic Signs into dataframe
uktrafficsigns = pd.DataFrame(signage, columns=['Sign', 'Sign Description'])
uktrafficsigns.head()
Out[8]:
Sign Sign Description
0 /images/highway-code/entry-to-20-mph-zone.png Entry to 20 mph zone
1 /images/highway-code/end-of-20-mph-zone.png End of 20 mph zone
2 /images/highway-code/maximum-speed.png Maximum speed
3 /images/highway-code/national-speed-limit-appl... National speed limit applies
4 /images/highway-code/school-crossing-patrol.png School crossing patrol
In [9]:
'''
the "image" is just part of the image link, 
must parse the first half in order to have full image link

'''
uktrafficsigns['Sign'] = 'https://www.learnerdriving.com/'+uktrafficsigns['Sign'] 
uktrafficsigns.head()
Out[9]:
Sign Sign Description
0 https://www.learnerdriving.com//images/highway... Entry to 20 mph zone
1 https://www.learnerdriving.com//images/highway... End of 20 mph zone
2 https://www.learnerdriving.com//images/highway... Maximum speed
3 https://www.learnerdriving.com//images/highway... National speed limit applies
4 https://www.learnerdriving.com//images/highway... School crossing patrol
In [10]:
'''
In some coding below I saw that one of the fields was blank (at index 42) but was not reading as null.
In order to fix that I changed the "Sign Description" and decided to place it here.
'''
uktrafficsigns.at[42,'Sign Description']="T-junction with priority over vehicles from the right"
In [11]:
#I wanted to save this as a csv for later, and to stop unnecessary web scraping
uktrafficsigns.to_csv('uktrafficsigns.csv', header=False, index=False) 
In [12]:
#I wanted the html to show up as images instead of links
def path_to_image_html(path):
    return '<img src="'+ path + '" width="60" >'

pd.set_option('display.max_colwidth', -1)
ukts=HTML(uktrafficsigns.to_html(escape=False ,formatters=dict(Sign=path_to_image_html)))
HTML(uktrafficsigns.to_html(escape=False ,formatters=dict(Sign=path_to_image_html)))
Out[12]:
Sign Sign Description
0 Entry to 20 mph zone
1 End of 20 mph zone
2 Maximum speed
3 National speed limit applies
4 School crossing patrol
5 Stop and give way
6 Give way to traffic on major road
7 Manually operated temporary
8 STOP and GO signs
9 No entry for vehicular traffic
10 No vehicles except bicycles being pushed
11 No cycling
12 No motor vehicles
13 No buses (over 8 passenger seats)
14 No overtaking
15 No towed caravans
16 No vehicles carrying explosives
17 No vehicle or combination of vehicles over length shown
18 No vehicles over height shown
19 No vehicles over width shown
20 Give priority to vehicles from opposite direction
21 No right turn
22 No left turn
23 No U-turns
24 No goods vehicles over maximum gross weight shown (in tonnes) except for loading and unloading
25 Ahead only
26 Turn left ahead (right if symbol reversed)
27 Turn left (right if symbol reversed)
28 Keep left (right if symbol reversed)
29 Vehicles may pass either side to reach same destination
30 Mini-roundabout (roundabout circulation - give way to vehicles from the immediate right)
31 Route to be used by pedal cycles only
32 Segregated pedal cycle and pedestrian route
33 Minimum speed
34 End of minimum speed
35 Distance to 'STOP' line ahead
36 Dual carriage-way ends
37 Road narrows on right (left if symbol reversed)
38 Road narrows on both sides
39 Distance to 'Give Way' line ahead
40 Crossroads
41 Junction on bend ahead
42 T-junction with priority over vehicles from the right
43 Staggered junction
44 Traffic merging from left ahead
45 Double bend first to left (symbol may be reversed)
46 Bend to right (or left if symbol reversed)
47 Roundabout
48 Uneven road
49 Plate below some signs
50 Two-way traffic crosses one-way road
51 Two-way traffic straight ahead
52 Opening or swing bridge ahead
53 Low-flying aircraft or sudden aircraft noise
54 Falling or fallen rocks
55 Traffic signals not in use
56 Traffic signals
57 Slippery road
58 Steep hill downwards
59 Steep hill upwards
60 Tunnel ahead
61 Trams crossing ahead
62 Level crossing with barrier or gate ahead
63 Level crossing without barrier or gate ahead
64 Level crossing without barrier
65 School crossing patrol ahead (some signs have amber lights which flash when children are crossing)
66 Frail (or blind or disabled if shown) pedestrians likely to cross road ahead
67 Pedestrians in road ahead
68 Zebra crossing
69 Overhead electric cable; plate indicates maximum height of vehicles which can pass safely
70 Cattle
71 Wild animals
72 Wild horses or ponies
73 Accompanied horses or ponies
74 Cycle route ahead
75 Risk of ice
76 Traffic queues likely ahead
77 Distance over which road humps extend
78 Other danger; plate indicates nature of danger
79 Soft verges
80 Side winds
81 Hump bridge
82 Worded warning sign
83 Quayside or river bank
84 Risk of grounding
In [18]:
'''
Here I am creating a df that will allow me to pull all junction signs.
"nction" was used instead of "junction" in order to pull all variables.
'''
junction =uktrafficsigns[uktrafficsigns['Sign Description'].str.contains("nction", regex=False)]

#Making it its own HTML object (same as above)

def path_to_image_html(path):
    return '<img src="'+ path + '" width="60" >'

pd.set_option('display.max_colwidth', -1)

HTML(junction.to_html(escape=False ,formatters=dict(Sign=path_to_image_html)))
Out[18]:
Sign Sign Description
41 Junction on bend ahead
42 T-junction with priority over vehicles from the right
43 Staggered junction
In [14]:
#Repeated the above steps for giveways
give=uktrafficsigns[uktrafficsigns['Sign Description'].str.contains("ive ", regex=False)]
def path_to_image_html(path):
    return '<img src="'+ path + '" width="60" >'

pd.set_option('display.max_colwidth', -1)

HTML(give.to_html(escape=False ,formatters=dict(Sign=path_to_image_html)))
Out[14]:
Sign Sign Description
5 Stop and give way
6 Give way to traffic on major road
20 Give priority to vehicles from opposite direction
30 Mini-roundabout (roundabout circulation - give way to vehicles from the immediate right)
39 Distance to 'Give Way' line ahead
In [15]:
#roundabouts
roundabout=uktrafficsigns[uktrafficsigns['Sign Description'].str.contains("ounda", regex=False)]

def path_to_image_html(path):
    return '<img src="'+ path + '" width="60" >'

pd.set_option('display.max_colwidth', -1)

HTML(roundabout.to_html(escape=False ,formatters=dict(Sign=path_to_image_html)))
Out[15]:
Sign Sign Description
30 Mini-roundabout (roundabout circulation - give way to vehicles from the immediate right)
47 Roundabout

Mapping of Problem Area

Below is a mapping of Rural Areas, with serious accidents, without signage, in T or staggered junctions, in 2016. This was one of the more problematic areas according to the plotting above, so it is plotted below using latitude and longitude coordinates in order to look at areas that may need signage.

(The reason for such drilled down plotting is 1) detail 2) notebooks ability to handle such large data and plotting.)

In [2]:
%%HTML
<div class='tableauPlaceholder' id='viz1572070880044' style='position: relative'><noscript><a href='#'><img alt=' ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Ac&#47;AccidentForecasting&#47;SeriousAccidentsinAreaswithHighDeprivationandNoSignage&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='AccidentForecasting&#47;SeriousAccidentsinAreaswithHighDeprivationandNoSignage' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Ac&#47;AccidentForecasting&#47;SeriousAccidentsinAreaswithHighDeprivationandNoSignage&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='filter' value='publish=yes' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1572070880044');                    var vizElement = divElement.getElementsByTagName('object')[0];                    vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px';                    var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>